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M EypRABDUM 

SUBJECT: Use of fec:.ines In Solving Ci.hura — Moi.oal,habetic 

Substitution 

There has been recent dlscu.-3 ! -n or. th * rL- cr that autorat'c cxru J irs 
or data pr^caasdnr: nacMres night pi a* j a tn«. s. lutljn or clpbc.'L. TUs ' 

nomorar.dum discusses one of the slnplwjt case*. na..ul;-, a ;x?no-c Lpliahf t 
substitution cid er, it is ho-ed t .at the .t y of a sinclc nasu sill be 
o:’ fioic help in uno r-tc.rriin~ the role of "-achincs a.:c. their limit . ions. 

Description of Cipher Text. 



The cipher text consists of 100 loiter-. Those are divided into 20 
five-letter groups with no apparent irrd 4 catj.au of any proper »*.rd division. 
'■’I*- U:.t appears in bl.o Corr.i of perforations In a tape and it is asc.vu.iud 
that the mchino can rerd this. It is also accuu> d tfei the nachire will 
type out the solution cn a paper tape, Ln capital letters, wl t. the proper 
word separation but. no further pur equation. 

7n order to restrict the operate oi.s vc start out by Liakinr or ? n 
'lc> sumptions. We do not inquire into the justification for th>*. 

7 . The cryptogram is a singlo alphabet substitution cipher with 
word se partitions and punctuation omitted. 

2. The olain text is in straight forward Enclish without codr 
words and tj.erc has been r.o effort :iade zn make tne cipher into s puzslc by 
introducing unusual words and phraseologies. There may be unucual 'rords in 
tne nor-.nl coutlu of iw&iws, however. There map also be words which have 
nut found their way into the dictionary. 

It is necosca~ tc provide for th». possibility that no solution is 
obtained, ln '.r&uLlce tl is would usually be because the assumptions are 
incorrect. The plain text ai 4 ;ht for exnap? e be in Spanish. 

Solution 



Thf solution to he outlined below is not well thoupht out, and it 
probably would be modified Ir. import uit details at lead . however, at tho 
j.rf»sc:i + uta>*o what is needed is an illustrative solution, and an exact 
opti. u.j procedure is not of greau iia^ art u.ce. 

It is sugrreoted tint the followinr material, arranged in order of 
imiorta’cr, be stored in tr.e mchi'-e: 
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1. The rolative frequency af the letters in ilnglish. 
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2. The relative frequency of all digraphs. There are C 16 of 
these. Sonc of these such as j j are very rare but all may occur. The 
rarost ones my be said to have a probability of approximately zero. 

3. The relative frequency of about 10000 trigraphs . These are 
some 17000 of these but it would 3oem unnecessary to list those ■whose 
probability is substantially zero. Periifcpc 10000 are unnecessarily many, 

j 

1. A list of the 10000 most frequent tetragraphs. 



5* A list of 10000 most frequent words.' These should be grouped 
by length, one letter words A two letter, throe letter, etc. Within each 
such group the listing my bo alphabetical and it would perhaps be advanta- 
geous to have alphabetical listings starting with the second, thii\i, etc., 
letters as well as by the first. I erhaps 10000 is an unnecessarily large 
number of words (and trigraphs and tutragraphs). 



- ■■■•r 



The following arc the suggested steps in the solution, 

1. Make a frequency count of the letters. 

2. Make a skeleton of a tentative solution by putting o's for 
the most frequent letter, t for the next most frequent, etc., until about 
one-half of the letters have been replaced. This has perhaps marie qse of 
ten different letters which we may assume to be 



etaoinohrd 



3* rjeamine the digraphs. 'The number of these iri.ll vary from I. 

message to message but is usually not far from an average of 25'. There arc 
only” 100 possibilities and only a few of them are very improbable. Among 
these are aa,ii,jge,ao and hh. In addition to looking for the individual 
possibilities we divide the digraphs into classes as follows: 



A - vowel, vowel 
B - vowel, consonant 
G - consonant, vowel 
D - consonant, consonant 

B and C should j rodominate. If A and D predominate or if B and C don't 
predominate sufficiently it is an indicatipn that consonants anti vowels 
should ee interchanged. Thus the procedure is to inhere; anye t with e, a, 
o, etc. and note whether the conditions Improve. Similarly. n, s, etc. may 
be tentatively exchanged with n, etc. When classes D and C rredoninate 
sufficiently it is probable th.it the assignment as between vowels and 
consonants are for tnc most t>' rt correct, but within these groans there is 
ample room for error, a and i are difficult to distinguish. 
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4 . The trirraphs may i>e assumed to average ten or thrive, with 
half the letters in place _ but lijis number ia subject to variation?, wore so 
than the nuriber of digraphs. To t l e extent that they occur fcj.c,. are now 
exami nant for probabij i Ly. If vny inj robable onos occur tho found letters 
a:-; i nterchanred tentative ly until the tri^raphs become more probable. In 
tJ .esc ocerations _'t is to be expected that somg progress will bo r.«.4e Li 
ji.i.rtvL. 1 * aesipnnnnts within tuc vowel group and idtbin tre consona.it group 
as well as between the rroup: . During these operations tentative intei- 
chanrcs between the letters already placed and the more freiucnt of those 
not ; et placed (p rhaps 1, u) should also be atterpled. 

f. The nor 1 criterion to o,ttenpi. is that of t'.e telrayrap'.h. 

‘11’ ose are very few perhaps fivv. • n the average, and thi3 pic ber is subject 
to vt r?. Large variations, ho reeve •, r >s+, of them have a very low ••rcoaoility 
but have to te accented in any casi.. Thu procedure would be to nie.a <\3 
ui.dcr the tri'vra] i.s. 

( . .Th*5 next stop ia to at* onpt to aseijn a value to th*- next most 
frequent letter, the eleventh under our assumptions. We first tr; U u mod 
common ox those- that arc left. This rives riso to a number of di :-a. iis, 
l r* graphs, etc. arc 1 ti osc can be expected to deters line wh/'lia r it was a 
poqd choice. 1 r it i/as nut J.e next rost frequent one 1 b tried ,nu zo forth. 
I'’ it 13 difficult to find a satisfactory fit, attempts should be made to 
Interchange with one of thccc a? ready assigned. 

Tne other letters are ; died one by one in the same manner. 

7. The next logical stop is to attempt to broa-: r. e the tentative 
text into worde. It \j pears advantageous to begin with the least frequent 
letters. That is begin with the letter z and try to uetercine the words in 
t! ( text of which it is a part. Then rricood. to x, stc. One point in 

f. %or .>£ this is t'* at there ;.re few words to select from ara the search 
need not ..c long. Another point is that thuse Infrequent letters are rather 
likely to have b-*cn incorrectly aligned sc there is an crport unity for 
Making correction? early in the operatic:. Cncu this process get' .aider 
way the corrections that are »iede ar f - of ac.V stance in what follows end it 
should be nossiblo to identify subsequent words that occui* in thu ji_l if 
they arc also in tne stored list of words. 

8. After this proceso lias been carried as far as pus -.lb 1 o there 
aipearu to be nothing noru that cai be dona except to type out th«> na+rrial 
will the word srvncings found. If no a -lutlon was found tnl^ Wj.il be dis- 
closed by an •■xa.-ination oy a human. 

Evaj uation 



i 

I 

j 

i 

■ 

I 

I 

1 

I 

l 

I 

I 

I 



i 



i 



i 



i 

i 



i 



i 



The methoo outlined cannot be roll 3d bn to give c. con lei ^ sol ition. 

It ray be in. th" first place that tne :uetLod sometime b falls tu arrive at 
uvon an approximation. Wo are not much concerned wii i IV a case b .cause it 
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is believed that this only happens because, the message is boo Short or 
because some of the processes outlined havo not been lrell thought out* We 
arc concerned, however, with the eventuality that the results are correct 
in the main but the results as printed still arc incomplete* 

It is thouyht that eyefe cases would occur and further that in most of 
these cases it would be possible for a human being to look at the printed ! 
answer and deduce the correct answer very quickly. This is because the 
human ."feeing has relatively large span of perception bo that he con judge 1 
by how voids fail to fit together what corrections must fee made. He can ! 
also detect Incorrect word division by the samo me an si Further, he can 
locate garbles (caused cither by an enciphering epror py by faulty trans~ 

mission). Here alto the means is essentially a long perception span. 

. ■* 

We may also say that we have not been able to program the machine to 
make use of the semantics of the text but only the structure on a rather 
small scale. It does not appear that the matter of semantics vs. structure 
is the same as the natter of large perception span Vs. short span but the 
effect is similar. 

We have only beer, able to program into the machine. a‘ means 30r making 
use of the fine structure and what is lucking is the invention of fcoans 
for making an overall examination of the message and take corrective steps 
in accordance with the findings. , 

It may be corcluded that in tilt absence af such means it is sn esqential 
part of the sulution that a human bein~ sioull examine the tentative ablution 
as it comes from the mac tine. , 



If this must be done the question arises whether iL might not be bettor 
to have the examination node at an ef.rlior.8ta' e in- *J r proc-ei; /*s. The 
machine is at Its test in tho early stages when wliat is required Is counting 
and other simple operations. It is not doing Sp^well in the st'i^es wiien the 
last few letters are being introduced one at a tlyx-. ft may be that if the 
partial work were submitted for inspection at a time when 90# of the letters, 
have been inserted that a human being could complete the solution very 
quickly. This would save quite a little comparison work in connection with 
these last letters and also the work of word division. 
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